SAMPLING PLAN 


GOALS AND STANDARDS 


Goals 

• Collect information generalizable to the general adult 
population that meets characteristics as defined by each 
country. 

• For household-based surveys, use a multistage sampling plan. 

Standards 

• Use probability sampling. 

• Draw a different sample for each survey cycle. 
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IMPORTANT CONSIDERATIONS 


Working with a 
statistical consultant 
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Well-trained or experienced staffs are important, as they will be 
better at persuading respondents to answer the survey. See Section 
4 of this guide for more information on staff recruitment and 
training. 

Timing of calls or visits if doing a telephone or a face-to-face 
survey is very important. Because of the group nature of out-of- 
home obligations, most households have consistently timed at- 
home experiences. Try to identity any schedule patterns within 
different groups or survey units. 

The characteristics of the area or structure to be surveyed should 
give an idea of the efforts needed to make an initial contact. Urban 
areas with high population density tend to require more calls to 
make the first contact. Areas with high crime rates require more 
calls to first contact for face-to-face surveys. Large multiunit 
structures also require more effort to contact. Physical 
impediments to access (security doors, locked entrances, etc) tend 
to slow down contact. 

It is difficult and often impossible to fix problems after the fact. A 
statistician should be involved at the outset of your surveillance 
effort. The statistician should be familiar with the surveillance 
goals and objectives. 
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IMPORTANT CONSIDERATIONS 


Nonresponse error can create biased estimates. Because 
nonresponse is inversely related to response rate, surveys with 
higher response rates will generally have lower nonresponse bias. 
Nonresponse can occur both at the household and person level. 

There are different types of nonresponse in household surveys: 

• No one was contacted 

• The person contacted refused to answer the question 

• The person contacted was unable to respond (language 
problem, mental/physical health problem,etc.) 

Nonresponse error is the extent to which respondents differ from 
nonrespondents on characteristics of importance in the survey. 
Therefore, nonresponse errors may introduce a bias in the survey 
results. Reducing nonresponse error consists of obtaining the best 
response rate possible. 

The survey method should be determined by which would be the 
most appropriate for the types of questions and the population 
surveyed. Studies have shown that some specific populations are 
more or less likely to respond to various survey methods. For 
example, less educated people are more likely to respond to a 
telephone survey than a mail survey, which requires reading and 
writing skills. 

When doing nonresponse followup, switching to another survey 
method can be effective. For example, if the initial method of 
collection was through a mail survey, switch to a telephone survey 
or face-to-face interview for the followup effort. 

Incentives can be offered to increase participation. These can 
include money, free health or lifestyle assessment, gifts, a lottery, 
and others that would be appropriate to a country or region. 

Make sure the respondents know that the information they provide 
will be confidential. Sending an advance letter to respondents can 
be effective if it is official and personalized. Increase the buy-in of 
the survey by explaining to the respondents how important it is that 
they respond to the questionnaire for the good of their community. 
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IMPORTANT CONSIDERATIONS 


Certain subgroups may need to be oversampled to ensure adequate 
response rates of subgroups. 

Creating sample Stratification of the sample can be used to reduce sampling error, 
stratification In a stratified sample, the sampling error depends on the population 
variance existing within rather than between the strata. Another 
reason for stratification is that where marked differences exist 
between subgroups of the population (e.g., urban, rural), 
stratification allows flexible selection of sample allocation and 
design separately for each subgroup. 


Stratification is used only at the first stage of sampling. There is no 
benefit to stratifying at the dwelling or household selection stage. 

Decreasing sources Sources of error include noncoverage error, nonresponse error, unit 
of error nonresponse, and item nonresponse. 


Noncoverage error occurs because not all members of the general 
population are included in the sample. Population groups typically 
excluded from most general population surveys include persons 
living in nonresidential settings, such as hospitals, nursing homes, 
prisons, military bases, homeless shelters, and college dormitories. 
Compared with the size of the population of a country or region as 
a whole, the number of persons in these groups is generally small. 

Nonresponse error is the inability to obtain data for all persons in 
the sample population. This is a common problem in surveillance 
work. Information is presented below on reducing this type of 
error. 


Unit nonresponse occurs when an eligible sampling unit (a 
household, a person) does not respond or a respondent refuses to 
participate in the survey. Item nonresponse occurs when useful 
data are not obtained for a particular questionnaire item. 

Reducing With continuous surveillance, it becomes easier to deal with the 
nonresponse error problem of nonresponse. It is possible to do some analysis on the 
nonrespondents because we know what characteristics they have. 
The next step would be to identify what nonresponse comes from 
error, and what comes from bias. 
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IMPORTANT CONSIDERATIONS 


attempts to compensate for such features of the same design, 
statisticians apply devices that tend to reduce variances, principally 
stratification and sophisticated weighting methods. However, the 
features that tend to increase variances usually dominate. 4 

The design effect is a measure of the extent to which the 
interactions of all such features affect the sampling variance. It is 
defined as the factor by which the variance of an estimate is 
changed through departure from simple random sampling. A 
design effect value of I indicates that the sample design is as 
efficient as a simple random sample. A value greater than 1 
indicates a tendency for greater sampling error since a more 
complex and less statistically efficient design was used. Once the 
sampling plan has been determined, the design effect may be 
estimated using a statistical package, such as SAS. However, for 
new surveys, the design effect is usually not known. An estimate 
probably can be made using data from other surveys in similar 
areas. 

For a well-designed survey, the design effect usually ranges from 1 
to 3. But it is not uncommon for the design effect to be much 
larger, up to 7 or 8, or even to 30. 5 

Rules need to be established for sample replacement. It is 
important to have as high a level of participation among those 
selected in the original sample as possible. 

Sampling subgroups Information on subpopulations, such as racial or ethnic minority 

groups, must be considered when determining sample size to ensure 
an adequate number of respondents within these subpopulations to 
produce estimates with an accepted level of precision. Population 
subgroups may also be referred to as survey domains. 


4 Assessment of major federal data sets for analyses of Hispanic and Asian or Pacific Islander subgroups and Native 
Americans: Extending the utility of federal data bases. Section 2. Methodological and Statistical Issues Affecting a 
Survey's Ability to Produce Adequate Precision. Department of Health and Human Services/Assistant Secretary for 
Planning and Evaluation, May 2000. Available online: http://aspe.hhs.gov/hsp/minority-db00/task3/section2.htm> 

5 Shackman G. Sample size and design effect. Presented at Albany Chapter of American Statistical Association, 
March 24,2001. 
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IMPORTANT CONSIDERATIONS 


If simple random sampling is not used, some adjustment is needed 
for the estimated sample design effect. For example, if duster 
sampling is used, the precision of estimates would be lower than 
would be obtained through simple random sampling. This means 
that a larger sample size must be obtained than would be required 
with simple random sampling. The design effect cannot be 
determined until after the data are collected, but it must be 
estimated in advance to determine sample size. 

It is important to understand the role of design effect in determining 
the precision of a survey estimate. Design effect is a factor that 
reflects the effect on the precision of a survey estimate due to the 
difference between the sample design actually used to collect the 
data and a simple random sample of respondents. Design effect can 
only vary not from survey to survey but also from question to 
question. 

When the sampling unit is households, design effect is higher for 
characteristics that are the same for all members of the household, 
such as ethnicity, and lower for characteristics such as age and 
education. 3 

The clustering frequently results from several stages of sample 
selection, e.g., countries, provinces, municipalities, groups of 
neighboring households, and members of sample households. The 
extent to which the characteristics of persons within these levels of 
clustering tend to be correlated influences the size of the sampling 
variances. In most cases, clustering increases the sampling 
variances above what would result from a simple random sampling 
with the same sample size. Variances will also be increased if the 
sampling rates vary among members of the population. This can 
come about if some groups are oversampled or undersampled. It 
can also result from a fairly common practice of selecting a 
household sample, then choosing one member at random for a more 
detailed interview. Persons in large households then have smaller 
probabilities of selection than persons in small households. In 


3 U.S. Census Bureau. Technical paper 63: Current Population Survey design and methodology. Available online: 
http://www.bls.census.gov/cps/tp/tp63.htm 
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IMPORTANT CONSIDERATIONS 


of estimates. This is called design effect. If clusters of households 
are used to lower survey costs and reduce survey burden, the design 
effect will likely be greater than one. To compensate for this, a 
larger sample is necessary to achieve the same degree of statistical 
power. 

Identifying sampling Sample size must be adequate to provide reasonable estimates of 
requirements the prevalence of risk behaviours and preventive health practices 
among residents. Sample size can vary depending on data needs 
and availability of funds. 

The objectives of the surveillance system — that is, the subject 
areas for which data are desired — must be carefully evaluated. Tf 
the behaviour of interest is uncommon in the general population 
(e.g., less than 10%), the overall sample size will have to be large 
enough to include a sufficient number of respondents engaging in 
the uncommon behaviour. 

Information on subpopulations, such as racial or ethnic minority 
groups, must be considered when determining sample size to ensure 
an adequate number of respondents within these subpopulations to 
produce estimates with an accepted level of precision. 

Multistage cluster sampling is typically used in surveillance. 
Clustering sampled households in a limited number of geographic 
areas will help reduce the cost of data collection 2 by helping to 
keep staff and travel costs down. 

The tolerable range of error must be identified. For example, 
estimates with a range of error of ±3% will require a much larger 
sample size than estimates with a range of error of+10%. In 
addition, the desired level of confidence, indicating that the true 
value is within the tolerable range of error, has to be established. 
Typically, the 95% confidence level is used (although other levels 
can be selected). The smaller the required confidence interval, the 
greater the sample size requirement. 


2 L. Alecxih, J. Corea and D. Marker. Deriving state-level estimates from three national surveys: A statistical 
assessment and state tabulations. Department of Health and Human Services/Assistant Secretary for Planning and 
Evaluation, May 4, 1998. Available online: http://aspe.hhs.gov/health/repcyrts/st_est/ 
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IMPORTANT CONSIDERATIONS 


Ensuring a scientific Because interviewing every person is not economically feasible, the 
approach surveillance system should use a scientifically selected sample. 

The sample must be representative of the whole population. This 
will ensure that the data are credible and the evidence base for 
health risk is strengthened. Information obtained from a 
scientifically selected sample can be used to generalize results to 
the total population. You will need a valid technique for drawing 
the sample. 


Using probability 
sampling 


Probability sampling is a method of random sampling in which 
each member of the population has a known, nonzero probability of 
being selected, 


Several types of sampling designs will yield a probability sample. 
These sampling designs are described under “Determine sample 
methodology” in the Action Steps of this chapter. 

Assuring sampling A good sample design is efficient. An efficient sample produces 
efficiency results that are more precise than those from other samples of the 
same cost. The more precise your results, the more confidence you 
can have in them. An efficient sample design is dependent on both 
sample size and the selection procedures. For a given sample size, 
more precise results can be obtained by conducting the survey in a 
larger number of neighborhoods or towns. This is particularly true 
when residents in the same town tend to be more similar than 
residents in other towns. 


Samples obtained using large numbers of residents in only a few 
neighborhoods will tend to reflect the characteristics of the 
particular neighborhoods that are selected. Estimates from such 
samples will result in relatively high variability of the estimates and 
less precision. In contrast, samples that are obtained using small 
numbers of residents in many neighborhoods will more accurately 
reflect the population. In general, estimates from these samples 
will be close to the values for the entire population, that is, they will 
have better precision. Very large samples are usually not necessary 
because they require more documentation and more followup for 
only a slight increase in precision. 

Both cluster sampling and disproportionate stratified random 
sampling make sampling easier, but at a cost of decreased precision 
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Calculate the The required sample size can vary depending on data needs and 
sample size availability of funds. Sample size must be adequate to provide 
reasonable estimates of the prevalence of risk behaviours and 
preventive health practices among the various subgroups on which 
you want to collect data. The more subgroups for which you want 
to collect information, the larger your sample size will need to be. 

In addition, if a behaviour of interest is uncommon in the general 
population, the overall sample size will have to be large enough to 
include a sufficient number of respondents who engage in the 
behaviour. 

Be sure to seek input and guidance from a statistical consultant. 

Determine sample Probability sampling is a method of random sampling in which 

methodology each member of the population has a known, nonzero probability of 
being selected. Several types of sampling designs will yield a 
probability sample. These are: 

• Simple random sample — Each "unit" in the 
surveillance population has an equal probability of being 
surveyed. 

• Stratified random sample — The population is divided 
into a specified number of subpopulations, or strata 
(e.g., north/south, urban/rural), and a sample is drawn 
independently from each stratum. Stratified random 
sampling is comparable to conducting a number of 
independent surveys of subpopulations, then combining 
data from these separate surveys (strata) to produce an 
overall estimate. 

• Cluster sample — The sampling unit is a group, or 
cluster, rather than an individual. The clusters are 
usually of equal size. Cluster sampling of households 
reduces the number of individuals to be selected in order 
to complete a sample. Cluster sampling is easier and 
less expensive but often results in increased variance for 
measurements 
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ACTION STEPS 


• Disproportions! stratified random sample — This is a 
variation of cluster sampling. Information from 
previous surveys is used to classify units into strata that 
are either likely or unlikely to be selected. Units in the 
likely stratum are sampled at a higher rate than those in 
the unlikely stratum. 

Nonprobability sampling methods may produce reasonable 
estimates, but if there are unexpected findings or results, it may not 
be possible to determine if they are the result of the sampling plan 
or attributable to other variables. 

Take sources of sampling error into account. These include 
noncoverage error, nonresponse error, unit nonresponse, and item 
nonresponse. All steps should be taken to reduce nonresponse. 

Obtain a sampling Geographic coverage should ultimately include your entire country 
frame unless there are specific reasons for excluding certain areas. You 
may want to survey only selected areas or regions at first as you 
build capacity toward an expanded surveillance system for your 
entire country. 

If an adequate sampling frame exists from another ongoing national 
survey, such as census data collection, it could be used. The frame 
should include all of the population in the area under consideration, 
with the possible exception of geographic areas that are 
inaccessible or have widely dispersed populations. 

An existing sampling frame can be utilized only if sufficient 
information is available to demonstrate that its parameters are 
acceptable and that it is up to date. If an existing sampling frame is 
not current, it will need to be updated. 

The advantages of using an existing sample are: 

• Cost-effectiveness 

• Increased analytic power 

The disadvantages include: 
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The quality and usefulness of results from your surveillance 
system depend largely on the procedures used to select the 
participants. Because interviewing every person is not 
economically feasible, the surveillance system should use a 
scientifically selected representative sample. 

Action Steps to be taken for formulating a sampling plan 
include: 

1. Calculate the sample size 

2. Determine sample methodology 

3. Obtain a sampling frame 

4. Create a list of dwellings or households 

5. Select the sample 

Important Considerations to take into account when 
developing a sampling plan include: 

1. Ensuring a scientific approach 

2. Using probability sampling 

3. Assuring sampling efficiency 

4. Identifying sampling requirements 

5. Sampling subgroups 

6 . Creating sample stratification 

7. Decreasing sources of error 

8. Reducing nonresponse error 

9. Working with a statistical consultant 

Goals and Standards for a good sampling plan include: 
Goals: 

1. Collect information generalizable to the general 
adult population that meets characteristics as 
defined by each country. 

2. For household-based surveys, use a multistage 
sampling plan. 

Standards: 

1. Use a probability sample. 

2. Draw a different sample for each survey cycle. 
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